Goto

Collaborating Authors

 Arequipa Province


Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Omnilingual ASR team, null, Keren, Gil, Kozhevnikov, Artyom, Meng, Yen, Ropers, Christophe, Setzler, Matthew, Wang, Skyler, Adebara, Ife, Auli, Michael, Balioglu, Can, Chan, Kevin, Cheng, Chierh, Chuang, Joe, Droof, Caley, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Erben, Alexander, Gao, Cynthia, Gonzalez, Gabriel Mejia, Lyu, Kehan, Miglani, Sagar, Pratap, Vineel, Sadagopan, Kaushik Ram, Saleem, Safiyyah, Turkatenko, Arina, Ventayol-Boada, Albert, Yong, Zheng-Xin, Chung, Yu-An, Maillard, Jean, Moritz, Rashel, Mourachko, Alexandre, Williamson, Mary, Yates, Shireen

arXiv.org Artificial Intelligence

Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.


A short methodological review on social robot navigation benchmarking

Chhetri, Pranup, Torrejon, Alejandro, Eslava, Sergio, Manso, Luis J.

arXiv.org Artificial Intelligence

Social Robot Navigation is the skill that allows robots to move efficiently in human-populated environments while ensuring safety, comfort, and trust. Unlike other areas of research, the scientific community has not yet achieved an agreement on how Social Robot Navigation should be benchmarked. This is notably important, as the lack of a de facto standard to benchmark Social Robot Navigation can hinder the progress of the field and may lead to contradicting conclusions. Motivated by this gap, we contribute with a short review focused exclusively on benchmarking trends in the period from January 2020 to July 2025. Of the 130 papers identified by our search using IEEE Xplore, we analysed the 85 papers that met the criteria of the review. This review addresses the metrics used in the literature for benchmarking purposes, the algorithms employed in such benchmarks, the use of human surveys for benchmarking, and how conclusions are drawn from the benchmarking results, when applicable.


Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction

Yuan, Mengying, Wang, Wenhao, Wang, Zixuan, Huang, Yujie, Wei, Kangli, Li, Fei, Teng, Chong, Ji, Donghong

arXiv.org Artificial Intelligence

Natural Language Inference (NLI) is a fundamental task in natural language processing. While NLI has developed many sub-directions such as sentence-level NLI, document-level NLI and cross-lingual NLI, Cross-Document Cross-Lingual NLI (CDCL-NLI) remains largely unexplored. In this paper, we propose a novel paradigm: CDCL-NLI, which extends traditional NLI capabilities to multi-document, multilingual scenarios. To support this task, we construct a high-quality CDCL-NLI dataset including 25,410 instances and spanning 26 languages. To address the limitations of previous methods on CDCL-NLI task, we further propose an innovative method that integrates RST-enhanced graph fusion with interpretability-aware prediction. Our approach leverages RST (Rhetorical Structure Theory) within heterogeneous graph neural networks for cross-document context modeling, and employs a structure-aware semantic alignment based on lexical chains for cross-lingual understanding. For NLI interpretability, we develop an EDU (Elementary Discourse Unit)-level attribution framework that produces extractive explanations. Extensive experiments demonstrate our approach's superior performance, achieving significant improvements over both conventional NLI models as well as large language models. Our work sheds light on the study of NLI and will bring research interest on cross-document cross-lingual context understanding, hallucination elimination and interpretability inference. Our code and datasets are available at "https://github.com/Leonardo123-ui/CDCL_NLI" for peer review.


Geolog-IA: Conversational System for Academic Theses

Pozo, Micaela Fuel, Saltos, Andrea Guatumillo, Llumiquinga, Yeseña Tipan, Aguirre, Kelly Lascano, Jara, Marilyn Castillo, Mejia-Escobar, Christian

arXiv.org Artificial Intelligence

This study presents the development of Geolog-IA, a novel conversational system based on artificial intelligence that responds naturally to questions about geology theses from the Central University of Ecuador. Our proposal uses the Llama 3.1 and Gemini 2.5 language models, which are complemented by a Retrieval Augmented Generation (RAG) architecture and an SQLite database. This strategy allows us to overcome problems such as hallucinations and outdated knowledge. The evaluation of Geolog-IA's performance with the BLEU metric reaches an average of 0.87, indicating high consistency and accuracy in the responses generated. The system offers an intuitive, web-based interface that facilitates interaction and information retrieval for directors, teachers, students, and administrative staff at the institution. This tool can be a key support in education, training, and research and establishes a basis for future applications in other disciplines.


Rubric Is All You Need: Enhancing LLM-based Code Evaluation With Question-Specific Rubrics

Pathak, Aditya, Gandhi, Rachit, Uttam, Vaibhav, Devansh, null, Nakka, Yashwanth, Jindal, Aaryan Raj, Ghosh, Pratyush, Ramamoorthy, Arnav, Verma, Shreyash, Mittal, Aditya, Ased, Aashna, Khatri, Chirag, Challa, Jagat Sesh, Kumar, Dhruv

arXiv.org Artificial Intelligence

Since the disruption in LLM technology brought about by the release of GPT-3 and ChatGPT, LLMs have shown remarkable promise in programming-related tasks. While code generation remains a popular field of research, code evaluation using LLMs remains a problem with no conclusive solution. In this paper, we focus on LLM-based code evaluation and attempt to fill in the existing gaps. We propose multi-agentic novel approaches using question-specific rubrics tailored to the problem statement, arguing that these perform better for logical assessment than the existing approaches that use question-agnostic rubrics. To address the lack of suitable evaluation datasets, we introduce two datasets: a Data Structures and Algorithms dataset containing 150 student submissions from a popular Data Structures and Algorithms practice website, and an Object Oriented Programming dataset comprising 80 student submissions from undergraduate computer science courses. In addition to using standard metrics (Spearman Correlation, Cohen's Kappa), we additionally propose a new metric called as Leniency, which quantifies evaluation strictness relative to expert assessment. Our comprehensive analysis demonstrates that question-specific rubrics significantly enhance logical assessment of code in educational settings, providing better feedback aligned with instructional goals beyond mere syntactic correctness.


FedPref: Federated Learning Across Heterogeneous Multi-objective Preferences

Hartmann, Maria, Danoy, Grégoire, Bouvry, Pascal

arXiv.org Artificial Intelligence

Federated Learning (FL) is a distributed machine learning strategy, developed for settings where training data is owned by distributed devices and cannot be shared. FL circumvents this constraint by carrying out model training in distribution. The parameters of these local models are shared intermittently among participants and aggregated to enhance model accuracy. This strategy has been rapidly adopted by the industry in efforts to overcome privacy and resource constraints in model training. However, the application of FL to real-world settings brings additional challenges associated with heterogeneity between participants. Research into mitigating these difficulties in FL has largely focused on only two types of heterogeneity: the unbalanced distribution of training data, and differences in client resources. Yet more types of heterogeneity are becoming relevant as the capability of FL expands to cover more complex problems, from the tuning of LLMs to enabling machine learning on edge devices. In this work, we discuss a novel type of heterogeneity that is likely to become increasingly relevant in future applications: this is preference heterogeneity, emerging when clients learn under multiple objectives, with different importance assigned to each objective on different clients. In this work, we discuss the implications of this type of heterogeneity and propose FedPref, a first algorithm designed to facilitate personalised FL in this setting. We demonstrate the effectiveness of the algorithm across different problems, preference distributions and model architectures. In addition, we introduce a new analytical point of view, based on multi-objective metrics, for evaluating the performance of FL algorithms in this setting beyond the traditional client-focused metrics. We perform a second experimental analysis based in this view, and show that FedPref outperforms compared algorithms.


MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge

He, Jie, Hu, Nan, Long, Wanqiu, Chen, Jiaoyan, Pan, Jeff Z.

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks but face significant challenges with complex, knowledge-intensive multi-hop queries, particularly those involving new or long-tail knowledge. Existing benchmarks often fail to fully address these challenges. To bridge this gap, we introduce MINTQA (Multi-hop Question Answering on New and Tail Knowledge), a comprehensive benchmark to evaluate LLMs' capabilities in multi-hop reasoning across four critical dimensions: question handling strategy, sub-question generation, retrieval-augmented generation, and iterative or dynamic decomposition and retrieval. MINTQA comprises 10,479 question-answer pairs for evaluating new knowledge and 17,887 pairs for assessing long-tail knowledge, with each question equipped with corresponding sub-questions and answers. Our systematic evaluation of 22 state-of-the-art LLMs on MINTQA reveals significant limitations in their ability to handle complex knowledge base queries, particularly in handling new or unpopular knowledge. Our findings highlight critical challenges and offer insights for advancing multi-hop reasoning capabilities. The MINTQA benchmark is available at https://github.com/probe2/multi-hop/.


Exploring the Design Space of Cognitive Engagement Techniques with AI-Generated Code for Enhanced Learning

Kazemitabaar, Majeed, Huang, Oliver, Suh, Sangho, Henley, Austin Z., Grossman, Tovi

arXiv.org Artificial Intelligence

Novice programmers are increasingly relying on Large Language Models (LLMs) to generate code for learning programming concepts. However, this interaction can lead to superficial engagement, giving learners an illusion of learning and hindering skill development. To address this issue, we conducted a systematic design exploration to develop seven cognitive engagement techniques aimed at promoting deeper engagement with AI-generated code. In this paper, we describe our design process, the initial seven techniques and results from a between-subjects study (N=82). We then iteratively refined the top techniques and further evaluated them through a within-subjects study (N=42). We evaluate the friction each technique introduces, their effectiveness in helping learners apply concepts to isomorphic tasks without AI assistance, and their success in aligning learners' perceived and actual coding abilities. Ultimately, our results highlight the most effective technique: guiding learners through the step-by-step problem-solving process, where they engage in an interactive dialog with the AI, prompting what needs to be done at each stage before the corresponding code is revealed.


AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers

Wuttke, Alexander, Aßenmacher, Matthias, Klamm, Christopher, Lang, Max M., Würschinger, Quirin, Kreuter, Frauke

arXiv.org Artificial Intelligence

Traditional methods for eliciting people's opinions face a trade-off between depth and scale: structured surveys enable large-scale data collection but limit respondents' ability to express unanticipated thoughts in their own words, while conversational interviews provide deeper insights but are resource-intensive. This study explores the potential of replacing human interviewers with large language models (LLMs) to conduct scalable conversational interviews. Our goal is to assess the performance of AI Conversational Interviewing and to identify opportunities for improvement in a controlled environment. We conducted a small-scale, in-depth study with university students who were randomly assigned to be interviewed by either AI or human interviewers, both employing identical questionnaires on political topics. Various quantitative and qualitative measures assessed interviewer adherence to guidelines, response quality, participant engagement, and overall interview efficacy. The findings indicate the viability of AI Conversational Interviewing in producing quality data comparable to traditional methods, with the added benefit of scalability. Based on our experiences, we present specific recommendations for effective implementation.


Design and Implementation of Automatic Assisted Aiming System For Robomaster EP Based on YOLOv5

Qin, Junjia, Xu, Kangli

arXiv.org Artificial Intelligence

In the crucial stages of the Robomaster Youth Championship, the Robomaster EP Robot must operate exclusively on autonomous algorithms to remain competitive. Target recognition and automatic assisted aiming are indispensable for the EP robot. In this study, we use YOLOv5 for multi-object detection to identify the Robomaster EP Robot and its armor. Additionally, we integrate the DeepSORT algorithm for vehicle identification and tracking. As a result, we introduce a refined YOLOv5-based system that allows the robot to recognize and aim at multiple targets simultaneously. To ensure precise tracking, we use a PID controller with Feedforward Enhancement and an FIR controller paired with a Kalman filter. This setup enables quick gimbal movement towards the target and predicts its next position, optimizing potential damage during motion. Our proposed system enhances the robot's accuracy in targeting armor, improving its competitive performance.